sourcecode

Persistent structural context and ultra-fast repeated analysis for AI coding agents.

The problem

Every time an AI coding agent starts a new session, it has to re-parse the repository from scratch. For a large Java or TypeScript monolith, that means 5–15 seconds per invocation. Multiply by dozens of agent turns per hour, and repo context acquisition becomes a real bottleneck — not just latency, but tokens, compute, and iteration velocity.

sourcecode solves this with a persistent structural cache keyed on file content hashes. After the first scan, every subsequent invocation returns pre-built context in milliseconds. The repo doesn't change? The cache doesn't expire.

The cache is not a performance optimization. It is what makes sourcecode usable as infrastructure rather than a one-off tool.

Cache performance — measured on real repos

Repo	Size	Cold scan	Cache hit	Speedup
Keycloak	7,885 Java files	10.5s	0.6s	~17x
BroadleafCommerce	2,985 Java files	2.7s	0.3s	~9x

Cache keyed on content hashes — invalidated only when source changes. On repeated agent sessions against the same codebase, nearly every invocation is a cache hit.

Token output (measured):

Mode	BroadleafCommerce	Keycloak
`--compact`	~2,900	~4,000
`--agent`	~4,800	~5,500
`onboard`	~2,600	n/a
`fix-bug` (trimmed)	~27,000	~4,600

What changes at 0.3s vs 2.7s

At 2.7s per call, you use sourcecode to occasionally inspect a repo.

At 0.3s per call, you use sourcecode as constant infrastructure inside agent loops:

agent loop iteration:
  1. sourcecode . --compact          # 0.3s — instant structural context
  2. sourcecode impact PaymentService . --depth 1   # 0.4s — blast radius check
  3. agent makes targeted change
  4. repeat

Sub-second context retrieval changes the cost model for agent workflows. You can call sourcecode before every edit, before every PR review, before every test run — without batching or caching calls manually.

Installation

Homebrew (macOS / Linux)

brew tap haroundominique/sourcecode
brew install sourcecode

pip / pipx

pip install sourcecode
# or with isolation:
pipx install sourcecode

Verify

sourcecode version
# sourcecode 1.35.16

Quickstart

# High-signal summary — warm cache: ~0.3s, cold: 2–10s depending on repo size
sourcecode --compact

# Add git hotspots and uncommitted file count
sourcecode --compact --git-context

# Structured output for AI agents — bounded, noise-free, ready to inject
sourcecode --agent

# Blast radius: what breaks if this class changes?
sourcecode impact OrderService /path/to/repo

# Spring semantic audit: TX anomalies + security surface (free)
sourcecode spring-audit /path/to/repo

# Impact chain: systemic blast radius with TX/SEC enrichment (free)
sourcecode impact-chain OrderService /path/to/repo

# Event topology: publisher → event → consumer graph (free)
sourcecode impact-chain OrderPlacedEvent /path/to/repo --type events

# REST endpoint surface
sourcecode endpoints /path/to/repo

# Onboard to an unfamiliar codebase
sourcecode onboard /path/to/repo

# PR review: risk, test gaps, changed modules
sourcecode review-pr /path/to/repo --since main

# Bug triage: risk-ranked files by symptom
sourcecode fix-bug /path/to/repo --symptom "NullPointerException in checkout"

Cache system

sourcecode maintains a persistent cache at .sourcecode-cache/ inside each repository. Two layers:

L1 (core): analysis result keyed by (git_sha, analysis_flags). Survives format changes — you can regenerate --compact vs --agent views from the same core.
L2 (view): rendered output keyed by (core_hash, view_flags). Exact output match — no recomputation.

Lookup order: L2 exact hit → L1 hit + view rebuild → full cold scan

Cache invalidation: Keyed on git commit SHA. Any commit invalidates the core cache for that repo. Uncommitted changes are not cached.

# Inspect cache state
sourcecode cache status

# Warm the cache ahead of an agent session
sourcecode cache warm

# Clear cache
sourcecode cache clear

--no-cache bypasses both layers and forces a fresh scan. Use in CI or when you need to verify a fresh result.

Visibility: Cache hits are silent. Use sourcecode cache status to see cache size, hit keys, and last-warmed timestamp.

Agent workflow patterns

Start of session — structural grounding

# Inject as first message to agent (bounded, deterministic)
sourcecode /repo --compact              # ~2,500–4,000 tokens
sourcecode /repo --agent               # ~4,500–5,500 tokens — more detail
sourcecode onboard /repo               # task-structured: entry points, key files, gaps

Before every change — blast radius + TX/SEC check

# Always target the INTERFACE in Spring projects, not the implementation:
sourcecode impact OrderService /repo           # ✓ 30 callers, 11 endpoints
sourcecode impact OrderServiceImpl /repo       # ✗ 0 callers (Spring DI blindness)

# Impact chain: blast radius enriched with TX boundary and security surfaces
sourcecode impact-chain OrderService /repo

# Event topology: who publishes/consumes this event, and in what TX phase?
sourcecode impact-chain OrderPlacedEvent /repo --type events

# Spring audit: catch TX anomalies before they hit production
sourcecode spring-audit /repo --scope tx

Continuous agent loop — delta context

# Only changed files + their transitive importers — minimal token cost:
sourcecode prepare-context delta /repo --since HEAD~1
sourcecode . --changed-only --git-context

PR review — structured risk signal

# JSON for programmatic use:
sourcecode review-pr /repo --since main --output review.json
jq '.ci_decision' review.json    # "analysis_success" | "git_ref_error"

# Markdown for GitHub comment:
sourcecode review-pr /repo --since main --format github-comment

Bug triage — symptom-driven

# Specific symptoms produce the best signal:
sourcecode fix-bug /repo --symptom "OIDC token refresh fails after realm update"
sourcecode fix-bug /repo --symptom "NullPointerException in OrderService during checkout"

# Generic symptoms produce noisy output — be specific.
sourcecode fix-bug /repo --symptom "payment timeout" --output triage.json

In CI — cached, deterministic, fast

# Content-hash cached — safe to run on every commit; cold only when code changes
sourcecode /repo --compact --output context.json

# PR gate
sourcecode review-pr /repo --since $BASE_REF --output review.json
DECISION=$(jq -r '.ci_decision' review.json)
if [ "$DECISION" != "analysis_success" ]; then echo "Review failed: $DECISION"; fi

What sourcecode does (and doesn't)

sourcecode reduces exploration cost. It accelerates context acquisition and minimizes repeated repo parsing. It does not replace reading code — it reduces how often an agent needs to.

Specifically:

Extracts structural signals: entry points, Spring roles, REST surfaces, dependency graphs, transactional boundaries
Builds and caches these on first scan; serves from cache on subsequent calls
Produces bounded, noise-free JSON designed for direct injection into agent context windows
Computes blast radius (impact graph) from a class or interface, traversing reverse dependencies

What it does NOT do:

No runtime analysis — all signals are static (annotation, import graph, file structure)
No semantic code understanding — reads structure, not logic
No replacement for reading code — reduces how often that's needed, not whether
Architecture pattern detection best for Spring MVC layered apps; SPI/plugin architectures (e.g. Quarkus extension model) may be misclassified
Endpoint recall for JAX-RS subresource locator pattern is ~65%
impact on implementation classes (e.g. OrderServiceImpl) returns 0 callers in Spring Boot — callers inject the interface via @Autowired. Always target the interface. When direct_callers: [] with confidence_level: high for a @Service class, re-query the interface.
no_security_signal on endpoints means no method-level annotations found — does not mean the endpoint is unsecured. Projects using Spring Security filter chains show 100% no_security_signal even when fully secured.
spring-audit and impact-chain are Java/Spring only — non-Java repos return spring_detected: false
Event topology via --type events does not resolve Kafka/RabbitMQ/Redis message routes — only Spring ApplicationEvent and @EventListener chains
Self-invocation TX bypass (calling @Transactional method from the same class without going through the proxy) is not detected

Command reference

`--compact` and `--agent`

Core flags. Feed directly to AI agents as first-message context.

Flag	Output	Tokens
`--compact`	High-signal summary: stacks, entry points, dependencies, confidence, gaps	~2,500–4,000
`--agent`	Structured JSON: identity, entry points, architecture, event flows	~4,500–5,500

`impact` — blast-radius analysis

sourcecode impact ClassName /path/to/repo
sourcecode impact org.example.OrderService /path/to/repo   # FQN also accepted
sourcecode impact OrderService . --depth 2                 # limit BFS depth

Field	Description
`direct_callers`	Classes that directly import or inject the target
`indirect_callers`	Transitive callers up to `--depth` (default: 4)
`endpoints_affected`	HTTP endpoints whose call chain includes the target
`transactional_boundaries_touched`	`@Transactional` classes in the blast cone
`mappers_affected`	`@Repository` / `@Mapper` / DAO classes in the blast cone
`security_surface_affected`	Security policies on affected endpoints
`cross_module_impact`	Subsystems touched, ordered by affected symbol count
`risk_score`	0–100 quantified change risk
`confidence_score`	0–1 confidence in the analysis
`explanation`	Human-readable risk summary
`candidates`	On partial match: up to 10 FQNs ranked by relevance

Best practices:

Target interfaces, not implementations: impact OrderService > impact OrderServiceImpl
Use --depth 1 when target has 200+ callers — direct endpoints are already the most actionable signal
Second impact run on the same repo is significantly faster (cache applies to underlying IR scan)

`endpoints` — REST API surface

sourcecode endpoints /path/to/repo
sourcecode endpoints /path/to/repo --output endpoints.json

Extracts all Spring MVC (@GetMapping, @PostMapping, @RequestMapping, etc.) and JAX-RS (@GET, @POST, @Path) endpoint methods. Returns HTTP method, path, controller class, and handler method.

`spring-audit` — Spring semantic audit [free]

sourcecode spring-audit /path/to/repo
sourcecode spring-audit /path/to/repo --scope tx          # TX anomalies only
sourcecode spring-audit /path/to/repo --scope security     # security surface only
sourcecode spring-audit /path/to/repo --min-severity high

Detects structural Spring anomalies that survive code review and tests, but cause production failures:

Pattern	Description
`TX-001`	`@Transactional` on private/final method — CGLIB proxy bypass, TX silently ignored
`TX-002`	`REQUIRES_NEW` nested inside `REQUIRED` call chain — unexpected transaction nesting
`TX-003`	`readOnly=true` boundary propagating to write operation
`TX-004`	`NOT_SUPPORTED`/`NEVER` called within active TX chain
`TX-005`	Exception swallowing inside `@Transactional` — silent TX rollback suppression
`SEC-001`	Unsecured endpoint in annotation-based security model
`SEC-002`	CVE-2025-41248: `@PreAuthorize` on inherited method from generic supertype
`SEC-003`	`@Transactional` on `@Controller`/`@RestController` — TX in wrong layer

Returns structured findings with severity, confidence, symbol, source_file, evidence, explanation, and fix_hint. JAVA/SPRING ONLY.

`impact-chain` — systemic blast radius with TX/SEC enrichment [free]

sourcecode impact-chain OrderService /path/to/repo
sourcecode impact-chain com.example.OrderService#placeOrder /path/to/repo
sourcecode impact-chain PaymentService . --depth 6

Unlike impact (which traces the caller graph), impact-chain builds on the SpringSemanticModel to enrich every step of the blast cone with transaction and security context:

Field	Description
`direct_callers`	Symbols that directly call the target
`indirect_callers`	Transitive callers (BFS up to `--depth` hops, default: 4)
`endpoints_affected`	HTTP endpoints reachable through the call chain
`transaction_boundary`	`@Transactional` semantics on the target: propagation, isolation, readOnly
`security_surfaces`	Per-endpoint security policy + SEC finding IDs
`impact_findings`	TX-001..005 and SEC-001..003 findings that touch the call chain
`risk_level`	`critical` \| `high` \| `medium` \| `low`

Event topology — query the publisher/consumer graph for a Spring event class:

sourcecode impact-chain OrderPlacedEvent /path/to/repo --type events

Field	Description
`publishers`	FQNs that publish this event class
`consumers`	Listeners with TX phase metadata (`AFTER_COMMIT`, `BEFORE_COMMIT`, etc.)
`event_graph`	Publisher → event → consumer edges (BFS ≤ 2 hops)
`transaction_context`	`AFTER_COMMIT` consumers, `BEFORE_COMMIT` risks
`risk_level`	Derived from TX phase and consumer count

Limitations of event topology:

Resolves Spring ApplicationEvent / @EventListener chains only
Does not trace Kafka, RabbitMQ, Redis, or other message brokers
Does not detect self-invocation proxy bypass
Conditional beans (@ConditionalOnProperty) are not evaluated at analysis time

`cold-start` — RIS bootstrap context

sourcecode cold-start /path/to/repo
sourcecode cold-start /path/to/repo --compact   # ~10K token subset

Returns the Repository Intelligence Snapshot (RIS) instantly — zero re-analysis. The RIS is built by a prior warm cache pass and includes stacks, entry points, endpoint surface, and Spring semantic signals. Status field: cold_start_ready | cold_start_stale | no_ris.

Use --compact to get a ~10K token subset safe for direct LLM injection. Full snapshot can exceed 100K tokens on medium repos — use --output FILE for local search tooling.

`repo-ir` — symbol-level IR

sourcecode repo-ir /path/to/repo --summary-only          # ~20K tokens
sourcecode repo-ir /path/to/repo --since HEAD~1           # symbol-level diff
sourcecode repo-ir /path/to/repo --files src/.../OrderService.java

Builds a deterministic symbol graph: classes, methods, import/injection edges, Spring roles, subsystems.

Size warning: Without --summary-only, output can exceed 1MB for mid-size repos. Always use --summary-only unless you need the full graph for downstream tooling.

`onboard` — codebase orientation

sourcecode onboard /path/to/repo

Entry points, architecture summary, key files, confidence level, and gaps. Designed to be injected as agent context at the start of a session.

`review-pr` — [Pro] PR review context

sourcecode review-pr /path/to/repo --since main
sourcecode review-pr /path/to/repo --since HEAD~3

Changed files, risk ranking, test coverage gaps, affected modules, and blast radius of changed classes. Returns a ci_decision field for CI/CD integration.

`fix-bug` — [Pro] Bug triage context

sourcecode fix-bug /path/to/repo --symptom "NullPointerException in checkout"

Risk-ranked file list correlated to the symptom: keyword extraction, path matching, content matching, git commit correlation.

`modernize` — [Pro] Modernization planning

sourcecode modernize /path/to/repo

High-coupling nodes (high fan-in = risky to change), dead zone candidates (isolated symbols), subsystem tangles.

`prepare-context` — task-specific context

Low-level access to all tasks with full options:

sourcecode prepare-context TASK [PATH] [OPTIONS]

Task	What it surfaces
`explain`	Architecture, entry points, key dependencies
`onboard`	Full structural context for new agents/developers
`fix-bug`	Files ranked by symptom correlation, risk, annotations
`refactor`	Structural issues, improvement opportunities
`generate-tests`	Source files without test pairs, coverage gap analysis
`review-pr`	PR diff with risk ranking, test gaps, module impact
`delta`	Incremental context: git-changed files + transitive import graph

Flags reference

Flag	Alias	Default	Description
`--compact`		off	High-signal summary (typically 2,500–4,000 tokens for mid-to-large Java repos): stacks, entry points, dependencies, confidence, gaps.
`--agent`		off	Structured JSON for AI agents: project identity, entry points, architecture, dependencies, confidence. ~4,500–5,500 tokens.
`--full`		off	Remove truncation limits on `transactional_boundaries`, `mybatis.dto_mappers`, and other capped lists.
`--git-context`	`-g`	off	Include git activity: recent commits, change hotspots, and uncommitted file count.
`--changed-only`		off	Limit output to git-modified files (staged, unstaged, untracked).
`--depth`		`4`	File tree traversal depth (1–20). Java/Maven projects auto-adjust to 12.
`--format`	`-f`	`json`	Output format: `json` or `yaml`.
`--output`	`-o`	stdout	Write output to a file instead of stdout.
`--no-cache`		off	Bypass scan cache and force a fresh analysis.
`--copy`	`-c`	off	Copy output to clipboard after a successful run.
`--no-redact`		off	Disable automatic secret redaction.
`--version`	`-v`	—	Show version and exit.

Output schema

All outputs include:

schema_version: output format version
confidence_summary: overall, stack, entry_points confidence levels (high/medium/low)
analysis_gaps: list of what could not be analyzed and why

Java/Spring-specific fields (when detected)

Field	Description
`language_version`	Java version from `maven.compiler.source` or equivalent
`deployment.spring_boot_version`	Spring Boot version
`deployment.packaging`	`jar` or `war`
`mybatis`	Mapper interface / XML file pairing summary
`transactional_boundaries`	Classes annotated with `@Transactional`
`deployment_risks`	Static risk flags: `spring-boot-2.x-eol`, `legacy-java-runtime`

Telemetry

Anonymous, opt-in. Collects: version, OS, commands, flags, duration, repo size range, errors. No source code, paths, secrets, or output content.

sourcecode telemetry status
sourcecode telemetry enable
sourcecode telemetry disable

Or: export SOURCECODE_TELEMETRY=0

Configuration

sourcecode config    # show version, config file path, telemetry status

Name		Name	Last commit message	Last commit date
Latest commit History 353 Commits
.agents/skills		.agents/skills
.github/workflows		.github/workflows
.planning		.planning
.sourcecode-cache		.sourcecode-cache
docs		docs
src/sourcecode		src/sourcecode
tests		tests
.continue-here.md		.continue-here.md
.gitignore		.gitignore
.ruff.toml		.ruff.toml
AUDIT-REPORT.md		AUDIT-REPORT.md
AUDIT_REAL_REPOS.md		AUDIT_REAL_REPOS.md
AUDIT_v1.31.23.md		AUDIT_v1.31.23.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
raw		raw
run_cli.py		run_cli.py

Folders and files

Latest commit

History

Repository files navigation

sourcecode

The problem

Cache performance — measured on real repos

What changes at 0.3s vs 2.7s

Installation

Homebrew (macOS / Linux)

pip / pipx

Verify

Quickstart

Cache system

Agent workflow patterns

Start of session — structural grounding

Before every change — blast radius + TX/SEC check

Continuous agent loop — delta context

PR review — structured risk signal

Bug triage — symptom-driven

In CI — cached, deterministic, fast

What sourcecode does (and doesn't)

Command reference

--compact and --agent

impact — blast-radius analysis

endpoints — REST API surface

spring-audit — Spring semantic audit [free]

impact-chain — systemic blast radius with TX/SEC enrichment [free]

cold-start — RIS bootstrap context

repo-ir — symbol-level IR

onboard — codebase orientation

review-pr — [Pro] PR review context

fix-bug — [Pro] Bug triage context

modernize — [Pro] Modernization planning

prepare-context — task-specific context

Flags reference

Output schema

Java/Spring-specific fields (when detected)

Telemetry

Configuration

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`--compact` and `--agent`

`impact` — blast-radius analysis

`endpoints` — REST API surface

`spring-audit` — Spring semantic audit [free]

`impact-chain` — systemic blast radius with TX/SEC enrichment [free]

`cold-start` — RIS bootstrap context

`repo-ir` — symbol-level IR

`onboard` — codebase orientation

`review-pr` — [Pro] PR review context

`fix-bug` — [Pro] Bug triage context

`modernize` — [Pro] Modernization planning

`prepare-context` — task-specific context

Packages